knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(magneto)
library(rtiff)
library(tiff)
rawTiff <- "AGC-D-19040830-19040901.tif"

This package is intended to be used on scanned images in .tif or .tiff format. Other formats can be used if another image importing function is created by the user. The functions shown below are meant to be building blocks for digitization of many types of images plotted against time. TIS (Trace Identification through Separation) and TISI (Trace Identification through Separation Improvement) are intended, in the default state, to be only used for magnetogram images; that being said, all parameters can be altered for use in other applications. It is recommended to find parameters for TIS and TISI by using the other mentioned functions below.

Worked example with a magnetogram, following the process used by TIS and TISI

Both TIS and TISI use 10 main functions to do the image processing. Note: all functions mentioned are available from the NAMESPACE for use by anyone.

import_process_image()

Used to change orientation of an image, as well processes the image into a matrix of pixels. These pixels are all 0 (black) or 1 (white). The pixels being 0 or 1 is very important for the rest of the digitization process!

Original <- readTIFF("Images/AGC-D-19040907-19040909.tif")
par(mar = c(0,0,0,0))
plot(Original)

As seen from the image above, the image is in the incorrect orientation for processing and viewing. It can also be seen that parts of the two right lines are inconsistent, and fade in and out in the bottom quarter of the plot. The image processing, below, fixes these issues, as well as well as increasing the contrast to produce only fully white (1) or fully black pixels (0). This decision comes down to thresholdBright and thresholdNonBright, described above.

processedImage <- import_process_image(imageName = "AGC-D-19040907-19040909.tif",
                                       file_loc = "Images/")
par(mar = c(0,0,0,0))
plot(processedImage)

Note on beta1 and beta0 parameters in the function.

The image below is an example of an overexposed image, as a consequence of the scanning process. This is where beta1 and beta0 are used.

knitr::include_graphics("Images/brightImage.png")

Taking a sample of 392 images, we created a vector of 1's (bright) or 0's(not bright) based off of a decision if an image was bright or not using a list of criteria:

If two or more of these were true, we answered yes to the image being bright.
Once this vector was established, there was a need to find a way of predicting the outcome; A logistic regression model was used for this, as it gave us the probability of the image being bright. The response(our decisions about the 392 images) was modeled using the glm() in R:

$$ \text{Bright Decision} = \frac{e^{-2.7743 + 51.9169Standard}}{(1+e^{-2.7743 + 51.9169Standard})} $$

with,

$$ \text{Standard} = \frac{\text{The pixels in the image above 80th percentile}}{\text{Total number of pixels}} $$ These, beta0 and beta1, will need to be configured for your respective data set if overexposed images are a problem.

trim_sides()

Used to remove unwanted matrix columns based off of a percentage of the image the user wants to remove. This is mainly used for removing overexposed edges that can come from the scanning process. Removal of any overexposure creates a much more accurate digitization.

Using the imported image from the last example:

For the magneto images, it is possible to remove 2 percent from either side of the image, with no loss of data. In functions like TIS(), this is done on every image.

rightLeftTrimmed <- trim_sides(image = processedImage, trimAmountLeft = 2,
                               trimAmountRight = 2)
par(mar = c(0,0,0,0))
plot(rightLeftTrimmed)

From the plot above (Fig 4), there is no flair on the right side of the image. This flair can be seen in (Fig 2)

trim_top_bottom()

Used to remove unwanted matrix rows on the image. This is mainly used for removing overexposed edges that can come from the scanning process. Removal of any overexposure creates a much more accurate digitization. NOTE: In (Fig 4) above, there is overexposure on the bottom and the top; but due to the white background it is hard to differentiate between the two.

For magneto images, it is possible to remove 100 pixels from the top, and 50 from the bottom. These are different, just due to where the flares were commonly found on the images.

topBottomTrimmed <- trim_top_bottom(image = rightLeftTrimmed,
                                    trimAmountTop = 100,
                                    trimAmountBottom = 50)
par(mar = c(0,0,0,0))
plot(topBottomTrimmed)

From Fig 5 above, you can see that part of the bottom line is cut off from the trimming. In our case, this is acceptable since we only digitize the top two lines. NOTE: by design, this package only digitizes two lines, no function has been created yet to digitize more.

find_cuts()

Used to isolate the two trace lines (the top two lines in Fig 5) that we want; while removing the words and the timing lines (the bottom two lines in Fig 5) that we don't want.

find_cuts() will return a list of two: - $TopCut - Integer of where the image should be trimmed on the top. - $BottomCut - Integer of where the image should be trimmed on the bottom.

These values are with respect to the matrix, so plotting won't show you the correct lines. You must trim the image first, then plot.

cutsForImage <- find_cuts(imageMatrix = topBottomTrimmed, 
                      cutPercentage = 2,
                      percentEdgeForLeft = 25,
                      percentFromEdge = 5)
trimForPlotting <- topBottomTrimmed[-c(0:cutsForImage$TopCut,
                                  cutsForImage$BottomCut:nrow(topBottomTrimmed))
                                  ,]
par(mar = c(0,0,0,0))
plot(trimForPlotting)

As seen from Fig 6, the image doesn't have the writing or the timing lines. Thus, the only thing left is the two lines that we want to trace.

mean_roll_image()

In most images in magneto, there are timing gaps in the traces, shown in Fig 7 below. These cause an issue when digitizing, because the algorithm couldn't go between the gaps and the writing at the ends of the traces. To fix this we mean rolled the images, to remove the gaps entirely.

knitr::include_graphics("Images/magnetoWithGaps.png")

The number for k were found to be the optimal for magneto. The goal is to remove the gaps, without blurring the image too much.

NOTE: mean_roll_image() trims the matrix automatically with the cuts found above.

meanRolled <- mean_roll_image(imageMatrix = topBottomTrimmed,
                              topcut = cutsForImage$TopCut,
                              bottomcut = cutsForImage$BottomCut,
                              fill = "extend",
                              k = 40)
par(mar = c(0,0,0,0))
plot(meanRolled)

The image is now mean rolled, so no gaps are found in each of the traces. It has also been trimmed using the top and bottom cuts, found in above step.

find_envelopes()

This function creates four lines, essentially isolating one trace in between two lines. These envelopes can be specified by the user if necessary, by using the improve arguments listed below.

I will do two runs of this, one with MatrixScaled that we can use for the rest of the digitization process, and another that is PlottingScaled, so you can see what is happening.

envelopesMatrixScaled <- find_envelopes(imageMatrix = topBottomTrimmed,
                                        rolledImage = meanRolled,
                                        bottomCut = cutsForImage$BottomCut,
                                        topCut = cutsForImage$TopCut,
                                        returnType = "MatrixScaled")

envelopesplotScaled <- find_envelopes(imageMatrix = topBottomTrimmed,
                                        rolledImage = meanRolled,
                                        bottomCut = cutsForImage$BottomCut,
                                        topCut = cutsForImage$TopCut,
                                        returnType = "PlottingScaled")
par(mar = c(0,0,0,0))
plot(topBottomTrimmed)
lines(envelopesplotScaled$TopEnvelope, col = "red")
lines(envelopesplotScaled$TopLowerEnvelope, col = "green")
lines(envelopesplotScaled$BottomUpperEnvelope, col = "yellow")
lines(envelopesplotScaled$BottomEnvelope, col = "orange")

These four lines will be used to now isolate the two traces entirely.

isolate_traces()

This function takes the envelopes and the imported image, to isolate the two lines entirely.

It is very important that you use the "MatrixScaled" envelopes for this function! Not the plot scaled versions. This function returns a list of the two matrices, each having one line contained in it.

isolatedTraces <- isolate_traces(imageMatrix = topBottomTrimmed,
                                 topEnvelope = envelopesMatrixScaled$TopEnvelope,
                                 topLowerEnvelope = 
                                   envelopesMatrixScaled$TopLowerEnvelope,
                                 bottomUpperEnvelope = 
                                   envelopesMatrixScaled$BottomUpperEnvelope,
                                 bottomEnvelope = 
                                   envelopesMatrixScaled$BottomEnvelope)
par(mar = c(0,0,0,0))
plot(isolatedTraces$TopTraceMatrix)
par(mar = c(0,0,0,0))
plot(isolatedTraces$BottomTraceMatrix)

Now, with these two isolated envelopes, the start and ends need to be found, for both the top and bottom traces.

env_start_end()

Used for obtaining the actual start and end points for the image. It should be noted that if an incorrect start and end is found, the rest of the process will still likely work, then the traces can be trimmed afterwords to the correct start and ends.

Running this function on both the TopTraceMatrix, and the BottomTraceMatrix:

TopStartEnds <- env_start_end(isolatedTraces$TopTraceMatrix,
                              thresh = 300,
                              gapLengthCutoff = 10,
                              returnMatrix = FALSE)

BottomStartEnds <- env_start_end(isolatedTraces$BottomTraceMatrix,
                              thresh = 300,
                              gapLengthCutoff = 10,
                              returnMatrix = FALSE)
par(mar = c(0,0,0,0))
plot(isolatedTraces$TopTraceMatrix)
abline(v = TopStartEnds$Start, col = "red", cex = 2)
abline(v = TopStartEnds$End, col = "green", cex = 2)

Notice that in this case, it found the correct starting point after the writing that was at the start of this image.

create_Trace()

This function is designed to create the trace line. It uses MA (moving average) smoothing to ensure as smooth as possible of a trace.

Creating trace lines for the two isolated images:

topTrace <- create_trace(isolatedTraces$TopTraceMatrix,
                         start = TopStartEnds$Start,
                         end = TopStartEnds$End,
                         topEnv = envelopesMatrixScaled$TopEnvelope,
                         bottomEnv = envelopesMatrixScaled$TopLowerEnvelope)

bottomTrace <- create_trace(isolatedTraces$BottomTraceMatrix,
                         start = BottomStartEnds$Start,
                         end = BottomStartEnds$End,
                         topEnv = envelopesMatrixScaled$BottomEnvelope,
                         bottomEnv = envelopesMatrixScaled$BottomLowerEnvelope)
plot(topTrace, type = "l", ylim = c(0,1000),
     ylab = "Trace Heights",
     xlab = "column index")
lines(bottomTrace)

It should be noted, if height is important for your data set, you will still need to scale these back to the original image size!

TIS()

This function allows the computer to process the entire image, with all functions listed above. I would recommend in order to find all parameters, process some images manually, then input those parameters into TIS to automate the digitization process.

As an example, I digitized the image used throughout the process above, using TIS()

imageData <- TIS(imageName = "AGC-D-19040907-19040909.tif", fileLoc = "Images/",
                 gapLengthCutOff = 10, plotPNG = FALSE, saveData = FALSE)
plot(imageData$TopTraceMatrix, type = "l", ylim = c(0,1000),
     ylab = "Trace Heights",
     xlab = "column index")
lines(imageData$BottomTraceMatrix)

Along with the two trace matrices, the TIS() returns the same information that we obtained above manually. This could be beneficial if needed for other applications. If plotPNG is TRUE in the TIS(), the function will automatically generate a plot and save it to the directory specified by the parameter pathToWorkingDir. If saveData is TRUE, it will also save all the generated data to a .RDS file in that same location. TISI() is meant for improving a failed digitization with an external application. TISI() differs from TIS() as it possesses the ability to specify any parameter needed for the digitization, thus, allowing digitization of images that fail using TIS().



Bott-binc/magnetoPackage2020 documentation built on Oct. 12, 2024, 7:49 p.m.